DeepConversion: Voice conversion with limited parallel training data

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Voice conversion based on Gaussian processes by coherent and asymmetric training with limited training data

Voice conversion (VC) is a technique aiming to mapping the individuality of a source speaker to that of a target speaker, wherein Gaussian mixture model (GMM) based methods are evidently prevalent. Despite their wide use, two major problems remains to be resolved, i.e., over-smoothing and over-fitting. The latter one arises naturally when the structure of model is too complicated given limited ...

متن کامل

Voice Conversion Using Exclusively Unaligned Training Data

Although all conventional voice conversion approaches require equivalent training utterances of source and target speaker, several recently proposed applications call for breaking this demand. In this paper, we present an algorithm which finds corresponding time frames within unaligned training data. The performance of this algorithm is tested by means of a voice conversion framework based on l...

متن کامل

Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks

We propose a parallel-data-free voice conversion (VC)method that can learn a mapping from source to target speech without relying on parallel data. The proposed method is generalpurpose, high quality, and parallel-data-free, which works without any extra data, modules, or alignment procedure. It is also noteworthy that it avoids over-smoothing, which occurs in many conventional statistical mode...

متن کامل

Text-independent F0 transformation with non-parallel data for voice conversion

In voice conversion, a simple frame-level mean and variance normalization is typically used for fundamental frequency (F0) transformation, which is text-independent and requires no parallel training data. Some advanced methods transform pitch contours instead, but require either parallel training data or syllabic annotations. We propose a method which retains the simplicity and text-independenc...

متن کامل

A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences

We extend our recently proposed approach to cross-lingual TTS training to voice conversion, without using parallel training sentences. It employs Speaker Independent, Deep Neural Net (SIDNN) ASR to equalize the difference between source and target speakers and Kullback-Leibler Divergence (KLD) to convert spectral parameters probabilistically in the phonetic space via ASR senone posterior probab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Speech Communication

سال: 2020

ISSN: 0167-6393

DOI: 10.1016/j.specom.2020.05.004